Overview

Dataset Statistics

Number of Variables 5
Number of Rows 3095
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 698.9 KB
Average Row Size in Memory 231.2 B
Variable Types
  • Numerical: 2
  • Categorical: 3

Dataset Insights

index is uniformly distributed Uniform
index is skewed Skewed
seller_zip_code_prefix is skewed Skewed
seller_id has a high cardinality: 3095 distinct values High Cardinality
seller_city has a high cardinality: 611 distinct values High Cardinality
seller_id has constant length 32 Constant Length
seller_state has constant length 2 Constant Length
seller_id has all distinct values Unique

Variables


index

numerical

Approximate Distinct Count 3095
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 49520
Mean 1547
Minimum 0
Maximum 3094
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 154.7
Q1 773.5
Median 1547
Q3 2320.5
95-th Percentile 2939.3
Maximum 3094
Range 3094
IQR 1547

Descriptive Statistics

Mean 1547
Standard Deviation 893.5939
Variance 798510
Sum 4.788e+06
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5776
  • index is not normally distributed (p-value 3.3626669230242674e-10)

seller_id

categorical

Approximate Distinct Count 3095
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 300215

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 3442f8959a84dea7ee...
2nd row d1b65fc7debc3361ea...
3rd row ce3ad9de960102d067...
4th row c0f3eea2e14555b6fa...
5th row 51a04a8a6bdcb23dec...

Letter

Count 37356
Lowercase Letter 37356
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61684
  • seller_id contains many words: 3095 words
  • seller_id has words of constant length

seller_zip_code_prefix

numerical

Approximate Distinct Count 2246
Approximate Unique (%) 72.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 49520
Mean 32291.0595
Minimum 1001
Maximum 99730
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • seller_zip_code_prefix is skewed right (γ1 = 0.9156)

Quantile Statistics

Minimum 1001
5-th Percentile 2464.3
Q1 7093.5
Median 14940
Q3 64552.5
95-th Percentile 89290.1
Maximum 99730
Range 98729
IQR 57459

Descriptive Statistics

Mean 32291.0595
Standard Deviation 32713.4538
Variance 1.0702e+09
Sum 9.9941e+07
Skewness 0.9156
Kurtosis -0.8588
Coefficient of Variation 1.0131
  • seller_zip_code_prefix is not normally distributed (p-value 7.049223706832418e-09)

seller_city

categorical

Approximate Distinct Count 611
Approximate Unique (%) 19.7%
Missing 0
Missing (%) 0.0%
Memory Size 232783
  • The largest value (sao paulo) is over 5.46 times larger than the second largest value (curitiba)

Length

Mean 10.167
Standard Deviation 4.0093
Median 9
Minimum 2
Maximum 40

Sample

1st row campinas
2nd row mogi guacu
3rd row rio de janeiro
4th row sao paulo
5th row braganca paulista

Letter

Count 29108
Lowercase Letter 29108
Space Separator 2312
Uppercase Letter 0
Dash Punctuation 5
Decimal Number 8

seller_state

categorical

Approximate Distinct Count 23
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 207365
  • The largest value (SP) is over 5.3 times larger than the second largest value (PR)

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row SP
2nd row SP
3rd row RJ
4th row SP
5th row SP

Letter

Count 6190
Lowercase Letter 0
Space Separator 0
Uppercase Letter 6190
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SP, PR) take over 50.0%
  • The largest value (sp) is over 5.3 times larger than the second largest value (pr)
  • seller_state has words of constant length

Interactions

Correlations

Missing Values